208 PART 5 Looking for Relationships with Correlation and Regression

»

» Test for a significant association or relationship between two or more

variables. The process is similar to correlation, but is more generalized to

produce a unique equation or formula relating to the variables.»

» Get a compact representation of your data. A well-fitting regression model

succinctly summarizes the relationships between the variables in your data.»

» Make precise predictions, or prognoses. With a properly fitted survival

function (see Chapter 23), you can generate a customized survival curve for a

newly diagnosed cancer patient based on that patient’s age, gender, weight,

disease stage, tumor grade, and other factors to predict how long they will

live. A bit morbid, perhaps, but you could certainly do it.»

» Do mathematical manipulations easily and accurately on a fitted function

that may be difficult or inaccurate to do graphically on the raw data. These

include making estimates within the range of the measured values (called

interpolation) as well as outside the measured values (called extrapolation, and

considered risky). You may also want to smooth the data, which is described in

Chapter 19.»

» Obtain numerical values for the parameters that appear in the regres-

sion model formula. Chapter 19 explains how to make a regression model

based on a theoretical rather than known statistical distribution (described in

Chapter 3). Such a model is used to develop estimates like the ED50 of a drug,

which is the dose that produces one-half the maximum effect.

Talking about terminology and

mathematical notation

A regression model is a formula that describes how one variable, the dependent vari-

able, depends on one or more other variables, and on one or more parameters.

(While it is technically possible to have more than one dependent variable in a

model, a discussion of this type of regression is outside the scope of this book.)

The dependent variable is also called the outcome, and the other variables are

called independent variables or predictors. Parameters refer to the other terms that

appear in the formula that make the function come as close as possible to the

observed data which are determined by the statistical software you are using.

If you have only one independent variable, it’s often designated by X, and the

dependent variable is designated by Y. If you have more than one independent

variable, variables are usually designated by letters toward the end of the alphabet

(W, X, Y, Z). Parameters are often designated by letters toward the beginning of the